I30CUMEKT I^UME 



ED 318 801 



TH 014 935 



AUTHOR 
TITLE 
PUB DATE 
NOTE 



Linacre, John M» 

Designing Your Own Rasch Analysis Program. 
Apr 90 

20p.f Paper presented at the Annual Meeting of the 
Americaji Bflucational Research Association (Boston, 
MA, April 16-20, 1990). 
Reports " Evaluative/Feasibility (142) — 
Speeches/Conference Papers (150) 



PUB TYPE 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



MFOl/PCOl Plus Postage. 

Algorithms; *Coraputer Assisted Testing; Computer 
Graphics,* Computer Simulation? Error of Heasurement? 
Goodness of Fit; Item Bias; *Item Response Theory; 
Maximum Likelihood Statistics; «Programing; 
Statistical Bias 
•R^sch Model 



ABSTRACT 



Advantages and disadvantages of standard Rasch 



analysis computer programs are discussed. The unconditional maximum 
likelihood algorithm allows all observations to participate equally 
in determining the measures and calibrations to be obtained quickly 
from a data set. On the advantage side, standard Rasch programs can 
be used immediately, are debugged and accurate, and can report 
statistics that are difficult to calculate, on the disadvantage side, 
the user must have the correct hardware, conclusions based on numbers 
appearing in the output are not necessarily valid, and the effort 
involved in producing one's own program can be considerable. 
Guidelines for wri'-ing programs, including those that surpass 
standard programs, are provided; and sample output from a number of 
standard programs is examined for strong and weak points. Emphasis is 
on adequate and useful statistics presented as easilv comprehended 
graphical output. Topics discussed include: estimated measure, 
standard error, goodness of fit, plotting fit, presenting the 
variable, and differential item functioning or bias. Eleven figures 
and a BASIC program for performing unconditional maximum likelihood 
estimations are provided. (TJH) 



« Reproductions supplied by EDRS are the best that can be made 
* from the original document. 



"PERMISSION TO flEPROOUCe THIS 
MATERIAL HAS BEEN GRANTED BV 



eOUCAnONAl RtSOilRCHS INFORMATION 
> CENTER (ERIC) 



Bt^^ dOc«m««t has reproduced as 

riKeivftd from the person or Ofgsntjajion 
ortQinshr^ it 

D M»noT cf^tiWS fysv* t»»n matJt to improwB 
reproduction Q««hly 



Designing Your Own Rascli Analysis Program 



John M. Linacre 

MESA Psychometric Laboratory 
Department of Education 
University of Chicago 



Paper presented at 
American Educational Research Association Annual Meeting 

Boston, Massachusetts 
April 1990 



ment do not necessarily fepreser^l oHic*fli 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIO." 



by 



BEST COPY AVAILABLE 



The advantages and disadvantages of standard Rasch analysis computer programs are 
disciissed* Sas^le output from a n\in^er of standard programs is examined for 
strong and weak points, and the guidance it gives to a program author. Emphasis 
is laid on adequate and ixseful statistics presented as easily comprehended 
graphical output. Source code for a simple Rasch analysis program is provided » 

Key words: Rasch Measurement, Coiaputers, Software 



Introduction 

There are two primary motivations for designing your own Rasch analysis program. 
First, ease of use. It may not be convenient to xise a general -purpose program 
becaiise your input data format is XM>n- standard or the results of your analyses 
need to be integrated into a testing or item banking system. Second, usefulness 
of output, ihe output of a general -purpose program may not be in the most useful 
form to determine the implications of your analyses or to communicate your 
results to the non*specialist. 

The increasingly wider dissemination of computer power has had considerable 
impact on the ease of application of Rasch measurement methods. Data analysis 
which formerly required the availability of mammoth centralized computer 
equipment can now be performed more easily, quickly and flexibly on a computer 
located on the analyst's own desk and under the analyst's complete control. 

In step with this revolution in hardware, Rasch analysis software has also been 
going through a revolution. As Wright recounts in his Afterword to Rasch (1980 
p. 188), Ute earliest analytical approaches were either easy to compute and 
inefficient in their use of the information in the observations, or prohibitively 
demanding in their computational requirements when applied to the data sets 
generated in educational testing sitxxations. 

This computation problem has been solved by the development of the unconditional 
maximum likelihood algorithm (UCON) (Wright and Panchapakesan 1969) . At the 
expense of a slight and usually insignificant statistical bias, all the 
observations can participate equally in determining the measures and calibrations 
to be obtained quickly from a data set, in a computationally-manageable manner. 

The UCON algorithm has been extended to rating scales and other non^dichotomous 
observations (Wright and Masters 1982), has proved robust against missing 
observations (Wri^^t and Linacre 1984) , and is also of direct application to data 
sets more complex than the conventional two facet (persons and test items) 
testing situation (Linacre 1989). A simple approximation to it, PROX, has long 
been in use (Cohen 1979, Wright and Stone 1979). 

With increasing computer-power available on the desk of the analyst, there is no 
longer the need to be bound by the constraint of the complexities and limitations 
of programs written for a main-frame environment. Even the word "programming" 
is now somewhat misleading, as useful Rasch analysis, or rearrangement of Rasch 
output, can be performed using the programming capabilities of many statistical 
packages, computerized spreadsheet programs, and even the facilities of word 

2 



processor software. 



There are a number of advantages associated with off-the-shelf software, but 
these advantages have traps for the unwary. 

1) The program can be used iimnediately* 

But the requirement is that you have the correct hardware, or a compatible 
compiler for programs provided in source code form* Integrating the input 
with a data collection system, or the output with a reporting system are 
often Irksome tasks. 

2) The program Is debugged and accurate. 

Bugs, however, exist in all programs, so it always pays to maintain a 
reasonable skepticism about program output. It is wise to check your 
program output against that of published data sets, such as ^'Knox Cube 
Test^ (Wright and Stone, 1979), and •'Liking for Science" (Wright and 
Masters, 1982), in order to verify the results. Do not be concerned about 
small numerical discrepancies in your measures, substantially less that 
the standard errors, since all results are estimates depending on options 
selected by the analyst. 

3) The program reports statistics that are difficult to calculate. 

But just because a mmiber appears on the output, all conclusions based on 
it are not necessarily valid. The interpretation of any statistic depends 
on some underlying distributional requirements. To the extent these are 
not met, the statistics may not have the meaning imputed to them. 

4) The effort to produce your own program may be considerable. 



Writing your o%m program 

This is not as difficult as it would appear from a cursory glance at the 
thousands of lines of computer code in the typical main- frame Rasch computer 
program. Often, these programs are attempting to incorporate as wide a range of 
input data observation formats and output reporting options as possible into one 
program. As an analyst, you are unlikely to use more than a small sub -set of the 
available options of such a program. 

The estimation routine itself can be compressed into a couple of pages of code 
which include simple input and output sub-routines* All the required information 
is in Wright and Masters (1982), which presents the mathematics for the UCON 
estimation equation, and also for the standard Rasch fit statistics. To get a 
flying start, you can even modify the source code of an existing program (see 
Appendix 1). 

Some of the benefits you can expect to obtain are: 

1) Flexibility which enables Rasch analysis to be incorporated seamlessly into 
a test system. This speeds up and simplifies the measurement process. 



The data need not be reformatted to oiatch the analysis program, and the 
output of the analysis program can be arranged to match the word 
processor, or statistical package into which it is to be inserted* 

2) Control over the cinaputation* You know precisely vrftiat the program is doing, 

and can adjust it to take t;he action you wish on perfect scores, aberrant 
behavior (e.g. excessive guessing), and incomplete tests. You can also 
calculate statistics relating to fit, differential item functioning and 
other features that are of interest. You may even wish to incorporate an 
alternative estimation method (Kelderman and Steen 1988, Linacre 1989). 

3) The analysis can take place in real time, simultaneously with computer- 

adaptive testing or optical scanning of score forms, enabling estimates to 
be made available immediately, ^ile the candidate, original score form, 
or content specialist is still present. The measures can be used 
Lmediately by the examinees or decision makers ^ rather than long 
afterwards . 

4) The program runs on your hardware, and can be modified to meet changes in 

your situation. 



Surpassing the standard programs 

The decision to write your own Rasch program may be motivated by inadequiacies in 
the output of the standard programs for your purposes. Considering that the 
standard programs have the reputation of being written by statistical and 
computer experts, sturpassing them may be regarded as unreasonably ambitious. But 
this is not the case. The standard programs were often originally written to 
meet certain specific requirements, and these may not match your requirements 
closely. A fancy package, or a cute name, do not guarantee the most useful 
results. The standari programs, however, can provide useful ideas and guidance 
in designing your own program. 

In principle, each component to be measured must be reported with at least three 
pieces of information on order for meaningful conclusions to be drawn from an 
analysis: 

1) its estimated measure: its position on the linear scale. 

2) its reliability or standard error: how precisely that position is determined. 

3) its validity or fit: how accurately the measure represents the attribute of 

the person or item which is participating in the measuring process. (The 
fit analysis can become quite detailed and diagnostically useful as you 
tailor it to your particular data) . 

Standard programs provide this information in a variety of ways, and, as we use 
some of them as examples, we must remember that only a short excerpt of the 
output of the computer programs mentioned is shown here, and that not necessarily 
representative of their most recent versions • The excerpts are intended to 
indicate features you might wish to look for in standard programs, and, when you 
design your own program, what features you might wish to include in It. 



Egt;tfflfltg4 H?agyr?, Stan^sr^ Srrpr m4 fi% 



For eaae of comprehension* it is tisafuX to present all relevant information 
together in one place in as clear a way as possible. The program output shown 
in Figure 1 lacks a quantification of the standard error of the difficulty 
calibrations p though, since it appears that each item was encoxintered by about 
17 people, each standard error must be at least 0.5 logits. This means that the 
printing of the difficulty calibrations to 3 deciBtal places is misleading. The 
fit statistics shown are not easy to use without access no a chi- square table, 
but appear to indicate an alarming degree of misfit. Figure 1 reinforces the 
need for measure, standard error and fit in a clear and easily understood form* 

The output shown in Figure 2 lacks any fit statistics, though fit can be 
estimated from other parts of the program output. It does have the advantage of 
including some facility for more ite tailed identification of the items 
(unfortunately limited here to the unimaginative ••XI"), and also details on the 
numbers used to calculate the estimates (355 cases and the "ITEM P", the 
proportion of correct responses). Again, three decimal places are printed, 
giving a misleading impression of the precision of the estimates. 

Figure 3 is an example of the c<Mnprehensive approach necessary for the main- frame 
program which is attempting to answer every conceivable question an analyst might 
pose, and yet still falling short. Measures, standard errors and fit statlst^-^,s 
are presented, but they may not be presented in the most useful way, or the xt 
statistics may not be the most useful ones, for the analysis currently being 
undertaken (Smith 1989). The large number of statistics, and their sometimes 
obscure definition and interpretation, can give the outsider the impression that 
Rasch analysis is too awkward to use and too complex to understand. Indeed, the 
experienced analyst can sometimes be mislead. For Instance, a minor redefinition 
of the manner in which the person sample is divided into ability group levels for 
each item (3 groups are shown in Figure 3), can have a major impact on the values 
of the "Between" t-statistics for any particular item. Figure 3 also includes 
a traditional statistic, the point biserial correlation, which is useful not for 
its magnitude, but for its directional diagnostic information. If an HCQ Item 
has been incorrectly keyed, or the direction of a rating scale item is reversed, 
then an immediate diagnosis of this is often a negative correlation. 



Production of Rasch estimates is well enough understood for practical purposes. 
The challenge now is how best to extract useful information and presenting the 
results of the analysis in a manner comprehensible to test users and decision- 
makers. A listing of numerical output is often daunting even to the analyst, it 
is ustially overwhelming and incomprehensible to the non-statistician* 

Rasch analysis programs themselves go somewhat in the direction of producing 
graphical as well as numerical output. Figure 4 shows rudimentary frequency 
distributions irtiich summarize long tables of numbers in a manner such that their 
general form can be understood more easily. The number of persons at each 
location on the scale is read vertically. This graphical output provides 



guidance as w** where to investigate further, such as Into the skewed distribution 
of person measures or into the obvious fit outliers, but detailed Infonsation has 
been lost. Figure 4 is of the ^on© size fits all*' school of thou^t. The ranges 
of the scales are selected by the program based on the data and may not be the 
iBOst meaningful. For instance , if the standard error scale were the same as that 
of the loglt measures, a good visual indication would be obtained of the 
precision of the saeasures. 



Plotting Fit 

Figure 5 goes fturther into detail in an attempt to present graphically the fit 
infonaation relating to an item. Over-- fit or dependency in the data would be 
indicated by an inqprobably close aligtmient between the observed 'X's and the 
expected '*'s; excessively random behavior by large discrepancies between the 
'X%s and '*'s. The degree of misfit near the item difficulty would be 
emphasised in the usual ** information-weighted" fit statistic; misfit more distant 
from the item difficulty would be emphasized by the usual '•outlier- sensitive** fit 
statistic. 

This suggests an improvement • Since the discrepancies are what we are interested 
in, perhaps thay are what should be plotted, in order to make the meaning more 
immediately comprehensible • Following Tukey's methods, Figure 6 is an attempt 
to clarify what is important about Figure 5, that is the differences between what 
is observed and what is expected for each of the ability groups. The plot is now 
less cluttered with redtindant 'X's, and the logistic ogive remains. 

Figure 6 clearly indicates the size and the direction of the discrepancies, but 
th'^re is no indication as to their significance. Figure 7 is an attempt to 
reme<fy this by presenting the residuals in a standardized form which takes Into 
account the number of scores in each ability group. For each ability group, the 
standardized residual is the observed score minus the expected score divided by 
the standard error of the observed score given the expected. This was done by 
using the numbers provided elsewhere on the output from which Figure 5 was 
extracted. It so happens that, for this small data set, the somewhat alarming 
shapes presented in Fl^ires 5 and 6 have little statistical significance. Since 
we have now lost the ogival shape, the location of the item difficulty is shown 
for reference. Further Improvements still could be incorporated into your own 
version of this. For instance, Interpretation is simplified if the ability axis 
is reversed, to put high abilities at the top. 



Presenting the Variable 

However well^fittlng the observations are to the measurement model, they have 
little tise if the resulting measures are not of the variable that was intended 
by the test developer. Thus one could imagine an advanced physics test In 
multiple-choice form given to grade school children. Whatever numbers emerged 
would not be measures of physics capability, but might be indications of "test- 
wiseness^. 

Consequently a necessary stage In analyzing a test is verifying the constxn;ict 



validity of tha test. Figure 8 presents one way doing this. This output could 
be presented to content specialists urtio are not comfortable with heavily nuiaeric 
output. For construct validity, the vertical order of the items should match the 
curriculum or test design scheme of the content specialists. The horizontal 
placement of the persons sho^ what they can be expected to accomplish. By 
entering a vertical line at -0,2 logits, I have indicated which items would be 
expected to answered correctly by an average person in this sample* 

An advance over Figure 8 would be to position the items vertically so that their 
spacing corresponded to their calibration. This would have the effect of placing 
the 'I's in Figure 8 on the identity llne« Figure 9 is a representation of the 
output in this revised form, an idea I obtained from the similar horizontal 
placement in the "Keytlath* score form (Connolly et al. 1976), The logit scale 
has been transformed into a less swthematical looking scale , and the items 
positioned according to their calibrations* The measures for each possible 
score, as well as reasonable estimates for extres^ scores* have also been placed 
on the vertical axis, but In a separate column for ease of use. This form acts 
as its own measuring device and replaces the requirement for computer analysis 
for items which have already been calibrated. It Is a Rasch analysis program 
which does not need a con^uter! Figure 10 (Stone and Wright, 1980) has taken 
this idea further to suimsarlEe the items in terms of criteria, and to include 
norm- referenced Information for the measures. 



Differential Item ftmctionlng (DIF or bias) 

Detecting the presence of DIF in an otherwise well --behaved Item is challenging. 
If DIF affects a reasonably large fraction of the examinee population, it must 
be small. If it were large, then ordinary information-weighted Rasch fit 
statistics would flag the Item as one to which the examinees respond in an 
apparently excessively haphazard way. 

Detecting DIF could be done by constructing a program which partitions the 
responses according to every available combination of background variables and 
producing a separate item calibration for each partition. Then all items for 
which there is a noticeable difference in calibrations between any pair of 
partitions could be flagged. An interesting program along these lines is TITAN 
(Grosse, 1990), which also reports on the degree to which DIF would explain 
misfit observed in the data. Generally such a comprehensive analysis leads to 
the discovery of a large number of apparently biased items. The bias itself, 
however, often is not replicable. Experience teaches us that **most items flagged 
have turned out to be luibiased** (Hills 1989). 

Nevertheless, the suspicion of item bias remains, and so the hunt for DIF 
continues. A computationally simplest method to detect Item bias is to partition 
the examinees: in one analysis include those for ^ose favor the bias is thought 
to operate, and in another analysis those against v^ose favor the bias is thought 
to operate. Include all the items, but, before the analysis, make a note of 
those items thought to be biased for substantive reasons apart from the accidents 
of the data in this test administration. Each analysis will provide each item 
with an estimate and a standard error. Calculate the standardized difference 

7 



8 



between the estimates, or better, plot the pairs of item difficulties. Most item 
points can be expected to lie close to the Identity line. Draw in confidence 
bands based on the standard errors, as shovn in Figure 11. If some of the items 
previously noted for potential bias do not have the greatest standardised 
differences, and so are rmt the Btost outlying, then it is not clear that any 
items are biased* Of course, there will always be some outlying items, but, 
unless there is external evidence, these can be expected to Jiffer between test 
administrations. This analysis and plotting procedure can be automated as 
procedures using standard analysis software and statistical programs. 

Designing your own Rasch software, in whole or in part, though requiring time and 
effort, can give clear ac^^antages over the pre**packaged software. The advantages 
are often both In terms of the quantity of information provided, and of the 
quality and ease of use of that information. 



P^bU9sr^phy 

Assessment Systems Corporation (1982) RASCAL computer program. Minnesota. 

Cohen, L. (1979) Approximate expressions for parameter estimates in the Rasch 
model. British Jour, of Math, and Statistical Psychology 32:113-120. 

Connolly, A*J., Nschtman, W. , Pritcheft, E.M. (1971, 1976) KeyMath diagnostic 
arithmetic test. Circle Pines, Minnesota: American Guidance Service Inc. 

Grosse, M.S. (1990) TITAN bias analysis computer program. Philadelphia: National 
Board of Medical Examiners. 

Hills J.R. (1989) Screening for potentially biased items in testing programs. 
Educational Measurement: Issues and practice. 8:4 pp. 5-11, 

Kelderman, H. and Steen, R. (1988) UKJIMO computer program. Twente, the 
Netherlands: University of Twente^ Department of Education. 

Linacre, J. M. (1988) FACETS computer program. Chicago: MESA Press 

Linacre, J.M. (1989) Rasch model parameter estimation with the extra-conditional 
algorithm. Rasch SIG newsletter Vol 3 No, 1. Spring. 

Rasch C* (1980) Probabilistic for some intelligence and attainment tests. 
Chicago: University of Chicago Press. 

Smith R, (1989) Item and Person fit in the Rasch model* Paper presented at 
American Educationial Research Association Annual Meeting, San Francisco. 

StensonH. and Wilkinson H. (1986) TESTAT computer program. Evanston: SYSTAT Inc. 



8 



9 



Stone, M.K. and Wri^t, B.D. (1980) Knox's Cube Test, Instruction Manual Cat. 
No. 33920M. Chicago: Stoeltlng Co. 

Wright B.D., CongdonR., Schulz M. (1984) KSCALE computer program. Chicago: MESA 
Press . 

Wright B.D. and Linacre J.M. (1984) MicroScale computer program. Westport, 
Conn.: Hedlax Inc. 

Wright B.D,, Linacre J.M. , Schulz M. (1989) BIGSCALE computer program. Chicago: 
MESA Press. 

Wright B.D. and Ms -ters G.N. (1982) Rating Scale Analysis . Chicago: MESA Press. 

Wrl^t B.D., Mead R.J., Bell S.R. (1980) BICAL coaputer program. Chicago: 
Statistical Laboratory, Department of Education, University of Chicago. 

Wri^t, B.D. and Panchapakesan , N.A. (1969) A procedure for sample -free iter 
analysis. Educational and Psychological Measurement 29:23-48. 

Wright B.D. and Stone M.H. (1979) Best Test Design. Chicago: MESA Press. 



Q 



10 



Rasch Model Item Calibration Prograa RASCAL (tm) Version 3.00 
Final Parameter Estimates for Data from File Testdata.Dat 



Item 


Difficulty 


Chi Sq. 


df 


I 


1.951 


78.659 


17 


2 


-0.246 


25.695 


17 


3 


1.827 


43.738 


17 



Figure 1. Item calibrations output by the RASCAL Rasch analysis program. 
Excerpted from Aasessment Systems Corporation (1989) Computerized Testing 
Products Catalog. 



APPROXIMATE 


RASCH ITEM 


DIFFICULTY 


DATA BASED 


ON 355 USABLE CASES 


ITEM 


LABEL 


ITEM P 


DIFFCLTY 


STD ERR 


1 


XI 


.310 


.810 


.126 


2 


X2 


.273 


1.026 


.131 


3 


X3 


.524 


-.276 


.117 



Figure 2. Item calibrations output by the TESTAT Rasch analysis program. 
Excerpted from SYSTAT Inc. (1986) TESTAT program manual. 





SEQUENCE 
NUMBER 


j ITEM 1 ITEM STANDARD 
j NAME 1 DIFFICULTY ERROR 












4 
5 
6 


{ IT04 1 -4.847 
1 IT05 j -4.244 
} IT06 1 -4.244 


0.852 
0.759 
0.759 












ITEM CHARACTERISTIC CURVE ITEM FIT 


STATISTICS 




SEQ 
NUM 


ITEM 
NAME 


j 1ST 2ND 3RD 
1 GROUP GROUP GROUP 


|FIT T- TESTS 
jBETWN TOTAL 


WTD 
MNSQ 


MNSQ 
SD 


DISC 
INDX 


POINT 
BISER 


4 
5 
6 


IT04 
IT05 
IT06 


1 0.80 1.00 1.00 
j 0.70 1.00 1.00 
j 0.70 1.00 1.00 


1-1.34 0.25 
1-1.05 0.62 
1-1.05 0.39 


1.04 
1.21 
1.11 


0.53 
0.42 
0.42 


1.09 
1.11 
l.U 


0.40 

0.42 
0.47 



Figure 3. Item calibrations output by the BICAL Rasch analysis program. 
Excerpted froiB University of Chicago, Departnient of Education, Statistical 
Laboratory (1980) Research Memorandum 23C. 

i O 



tfihtm 6.1 F»non Wmemt SuDmftry. 

11 11 1 

1 1 1111 1 1 11 32^53532 3501&3«<^63SZ 662 5 0 9 9 6 4 

^ * Q 4^ S * ♦-S — 

-2-10123 

Loslt E,E, : 

11121 11 1 
36914886545 0 0 d 6 

^ — ^-M-— S 0 5C ^ 

0 1 

1211 11 1 
133322065782937722531U1 11 2 

S--*-*M— -^--K) — — — + 4—-^ ^-"f 

01234567 

Infit Btand&rdiei^d; 

111 1 1 
113 524674357982469054485263 21 2 1 11 

4 ^.J|_„+.-Q-.+-._^g^«.— ^ ^g.^^^^«.Q^^4^-,3^«^^^„^^^ ^ 

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 



Figure 4. Frequency distributions output by the FACETS Rasch analysis program. 
Excerpted from output of FACETS program (Linacre 1989). 



ITEM 11 DIFFICULTY - -1.352 STD ESRCm 



,425 



ABIL 

<-3.25 
-3.00 
-2.50 
-2.00 
-1.50 
-1.00 
-.50 
.00 
.50 
1.00 
1.50 
2.00 
2.50 
3.00 
>*3.25 



PERCEKT a^RRKH^ 
0 10 to 30 40 5D 60 



70 80 90 100 



H 1 f- 



H — I — i — h~i 



)OQQQCKXXK)PDDOCIQQQQa^^ * 



Figure 5, Graphical output from the TESTAT Rasch analysis program. The item 
histogram shows percent correct scores, marked by "X^s, for each of 15 Rasch 
ability-score intervals. Indicates the expected percent correct based on the 
Rasch model. Rows with only an are ambiguous, containing no observations or 
no correct responses for persons in that ability interval. One histogram is 
produced for each item. 



II 



12 



'TEM it DZFPZCaLTf " -1.35 STD ESSOSt » 



.43 



ABXL 
<-3.25 
-3.08 
-2. SO 

-2.00 
-1.50 
-1. 00 
-.50 
.00 
.50 
1. 00 
1.50 
Z.J. 
2.50 
3.00 
>-3.25 



WSSSXn CORSECT 
10 20 30 40 50 60 

H — 1 — \ — 1 — I — h 



70 



80 90 100 

H 1 )■ 



^xxKmx 
*mx 

♦XX 

*x 



Figure 6. Plot of ability group score residuals. This is an alternative version 
of output shown in Figure 5. The item plot shows the residuals between the 
percent correct scores and the percentages expected for each of 15 Rasch 
abllity-score intervals. or indicates the expected percent correct based 
on the rasch model, indicates no observations are recorded for the ability 

level . 



ITSM 11 DIFFICULTY » -1.35 



.43 



-2-1012 



ABIL 
<-3.E5 

-2.00 
-1,50 
-1.00 
-.50 
.00 
.50 
1.00 

2.00 
2.50 
3,00 
>-3.25 



H 1 \ — I KH — I h~+ 



— 

♦X 

xxxxxx* 

XXX* 

*xxxxx 
*xxx 

*x 
*x 



Figure 7* Plot of standardised score-group residuals. Thi^ is an alternative 
version of output shown in Figure 5, This itenj plot shows the standardized 
residuals between the observed correct scores and the scores expected for each 
of 15 Rasch ability-score Intervals. Indicate the expected standardized 

residuals based on the Rasch model. ^-^^ indicates no observations are recorded 
for this ability level* locates the item difficulty. 



1:1 

13 



mJM ITEM HAMK -5.0 
HARD I- 



M0S7 mmmM wsvcmss chost fss^abls RESFoiiSE of ^1*^ x$ ^o^) 

-4,0 -3.0 -2.0 -1.0 .0 1.0 2.0 3.0 

^, I 



•I — 



•I 



18 


4-1-3-4-2-1-4 




IS 


1-3-2-4-1-3 




16 


1-4-2-3-1-4 




17 


1-4-3-1-2-4 




14 


l-^--2''3-4-l 




IE 


1-3-2-4-3 




13 


1-4-3-2-4 




11 


1-3-1-2-4 




10 


2-4-3-1 




a 


1-4-2-3 




6 


3-4-1 




9 


1-3-2*4 




5 


2-1-4 




7 


1-4-3-2 




4 


1-3-4 




1 


1-4 


• 


2 


2-3 


* 


3 


1-2-4 


* 


EASY 


1 



expected 



**XKCQHHECT" 



4,0 5.0 
-I 1 

1 
1 
1 



-5.0 



-4.0 



-3.0 



"2.0 



-1,0 



1.0 



2.0 



3.0 



-I 1 

4.0 5.0 



PERSONS 



1 
2 
M 



Figure 8. Plot of items in order of calibratloti (Excerpted from output of the 
MESA Press (1989) BIGSCA1£ Rasch Analysis program). The items are listed in 
descending order of difficulty, with their crlibrations Indicated by "1" on the 
horizontal scale. indicates an extreme score. The person measure 

distribution is indicated below the horizontal axis* 



'3 

14 





ITBi SCORE SHEET 






STORE - »fl£A»AE :t S.E. 












14 




130 
























13 


127 ^:XX 


120 
















D 


14 •» l'-W3''i-' 


2- 


'4 








n 


13 ^ i-^k-Z-^- 


•1- 


-4 


12 


116 ±9 




u 


12 " 1-3-2-^- 




^3 






110 
























11 


108 ±9 




a 


11 « i-^-a-'S- 


4- 


•1 






100 










10 


99 ±8 




u 


10 - 1-3--2-4- 


3 








SO 


n 


9 « l-^^-a-EHfc 






80 ±10 


60 










8 


60 ±10 




C3 


8 « 1-3-1-2-', 










70 
























7 


S7 ±11 


BO 
















IJ 


7 * 2-4-3-1 






5 


56 ±10 


50 
















U 


0 « 1-^4-2-3 






5 


47 ±9 


40 


D 


5 * 1-3-2-4 






4 


40 ±8 




a 


4 • 3-4-1 












u 


3 * 1-4-3-2 






3 


34 ±8 




D 


2 - 2-1-4 










30 
















a 


1 « 1-3-4 






2 


27 ±7 


20 
























1 


17 ±11 


10 


























7 ±11 (Extreme) 





Figure 9* An alternative presentation of the information in Figure 8 following 
the "KeyHaUi** design. The boxes ^[]^ could be checked ^J^ for correct answers, 
and crossed "x" for incorrect ones, and then counted to obtain the score. The 
scale is converted from logits: mean item difficulty - 70 units, 1 logit - 10 
units* Unexpected answers more than 20 units from the overall measure would be 
surprising^ with a probability of less than 0.1. 



15 



KNOX'S CUBE TEST "«"«^^f-p^^ 



16 

IS 
14 

13 

12 

U 
10 

-1- 



ifi 

IS 
14 

® 



7 
6 

T" 



IS 

14 

13 
» 

11 
10 
9 



8 

® 



H 

SI 

47 
44 




Cat No. 3382011 
JUNIOR REPORT FORM 



mam ssmm 



t.10 



6-7 



d ^^ jt^, — « 



18 

17 
16 

Is 

14 
13 

12 

10 
9 



© 

3 

0 



4 

3 



19 

15 

10 



Figure 10. A Rascb-based scoring form tailored to meet the requirements of 
potential users. (Excerpted froui Stone and Wright, 1980). 



J5 



16 




Figure 11. Differential Item Functioning. The calibrated difficulty for each 
item for boys is plotted against that for girls. The identity line and 
confidence band are plotted. The indicated items are outside the confidence 
interval, and there was reason to suppose, apart from this analysis, they were 
functioning In favor of boys, "B" , or girls, "G", 



17 



•n»m It • MSIC fM^rM to ptrfoni UOON Mtiioittoni on sets of r—pomn by penior« to ftms, 
■•llOHing for norvadifnlsterocl 1t«M vid extrcM scores, 
■ttte dita f{t« fenmt fs: 

1-30 ptrton identfffertt Coit 31* scored responses, one per colum 

• 1 ■ correct, 0 » Ineorrectf other • Ignore 
INPUT nrnmi BATA file IMCt", OATAFIUtt OPEM DATAPIUt FOR IKPUT AS f1 
IW>UT nxtm OUTFUT FILE KMC:"* OUTFIi^: OPEN OUTFILR FOR OUTPUT AS #2 
IW>UT *^ER MMKR OF ITBSs", IM4. 

INPUT i^lTet MJI^ OF I^RSfilKi", PALL 
DIN RES^ONKC^^L, lALL) 

BIN ICOISITdALL), IEXP{IALL>, lliiFIT(IALL}. ILOGIT(IALL), IOUTFIT(IALL}, ISCORECIALL), }$£(IAU), IVARCIALD 
DIN l»COUNT<WU.t>, PEXPCPALL), PWFITCPALLJ, PLOGITCPALL), POUTFITCPALL), PSMRfCPAlL), PSE(PALL>, PVAR(PALL) 
DIN »iANES<PM.L} 
*Dita ereu ere m follom: 

•f«r Mdi Its* or person: I Is the I tM being cat llM*Btc<i, P Is the person being m^sured: 
•lALL, f*ALL it tto ntai»tr In the data file 
•ICOUHTft), POQURTCP) fs tlw cowt of rt»>ponses 
•lEXPCD, PtXPCP) Is the expected score 

•IINFIT(I), PIMFITCP) fnfonaatlon-Metghtad fit ststlstlcs Cinflt"} 

•ILOGITC!), PLOetTCP) Is the loglt cal lotion or snasure 

•lOUTFITCD, POUTFIT<P) outller-sensltlve uni*ei^ted fit statistics ^^outfif) 

•ISCORECI), PSCORECP} la the number of successes 

•IKd), PUtPy Is the stwKterd error of estlnatlon 

•ITOTAL, PTOTAL Is the mater n«v-extr«ne ItsMS/persof^ 

•IVARCn, ^M(P} Is the variance of the logit estlmte 

•PNM^P) peraon ranes 

'RESPONSECP,!} Is the re^wnse by an person to an Item 

•BIAS Is the estlmatltv bias Inherent In the UCON algorfhm 

•KMVER^^ la a awldi, to let the program know If estimates have converged 

• JH£AN Is the siNffi logi calibration of the Itetns 
•RECOUNT Is the flag to recoMnt the scores 

•RESIIKML Is the difference between observed and expected scores for «i Iten or person 
•STMRES Is the stwidardlzeet residual 

•SUCCESS la the preb^llly of suecm by an person for an Item 

'VARIANCE Is the {N>oduct of the probdsllly of success «id failure by an person for en Itetn 
•Read In the data file 
FOR P ■ 1 TO PALL 
LINE INPUT #1, Lt: PNMStCP) > NIDSCLS, 1, 30} *the person narae 
FOR I « 1 TO lALL: f» > NIDSCLS, 30 •» l« 1) ^the responses 
IF R$ o "O"" ANO R$ o "1" THEN 

RESPOHM(P, 1} - -1 •flag as to be Ignored 
ELSE 

RE»>ONIS{P, I> « VALCRS) 
ENO IF 
NEXT I 
NEXT P 

•Initialize the variables 
ITOTAL ■ I ALL J PTOTAL « PALL 

FOR I • 1 TO lALLs leOtNIT(I) « 0: !»^E(I) " 0: ILOGITC!) « 0: NEXT ! 
FOR P « 1 TO PALL! PCOUMTCP) « Os PSKJRECP) •« Os PL06IT{P) » 0: NEXT P 

•Aecuaulate the acores, altoHlt^ for Ignored responses 
•Recount If there are txtrssK scores 
RECOUNT « -1 

WHILE RECOUNT! RECOUNT • 0$ PRINT •VOtMTINS.."; 
FOR P « 1 TO PALL 
IF PCniNTtP} >m 0 THEN 
R» I ■ 1 TO lALL 

IF lOOUNTCn >« 0 AND RESPONK(P, 1$ »" 0 THEN 
•Count up hoH Many re^aonses there are for •a^ Itcn and person 
ICOUNT(I) > ICOUNT(I) * It PCOUNTCP) » PCOUNT(P$ * 1 
•Count up the score for each Item and person 

IlKORECU » IKORE(I) * REl^ONSCP, t)f PS(X«E(P) « PSOKEfP) ♦ RE»>ONSECP, I) 
END IF 
NEXT I 

•flag the extrsMe scor«i for persons 

IF PSCORECP) • 0 THEN PCOUNTCP) » -Zs RECOUNT « -1 'none correct 



18 



If PCQUNTCP) > PSCOiSCP} THEN PCOUNT(P) « -1: HEOXMT « -1 'ail correct 
END IF 
NEXT P 

•flag the «tr«M scores for itons 
m I ■ 1 TO lALL 

If ICOUNT(I> >« 0 THEN 
IF ISCORECI) • 0 THEN ICOMTCi) « -2: RECOWT > -1 'none correct 
IF ICCUNTCI) ■ ISCORECO THEN ICOWTCD - -1: RECOUNT > -1 lali correct 
END IF 
N»T I 

IF RECOUNT THEN 

ITOTAl ■ 0: PTOTAL • 0 
FOR I > 1 TO lALL 

IF lOOUNTCI) * 0 THEN lOXJMTd) « 0: ISCORE<!} > 0: ITOTAL > ITOTAL ♦ 1 
NEXT I 

FOR P ■ 1 TO 

IF PCQUNTCP} > 0 THEN PCQUNTfP) > 0: PSCDRECP) • 0: PTOTAL » PTOTAL ♦ 1 
IffiKT P 
END IF 

\mm 

•Inftiet MMnptfen ft that all ftcMS &nd persora are equal at zero legUs 

•Ue ttop iteration, at comwrflenee, ^itm no logit neaaure changes by laore than .1 limits. 

NNVERSEOt » 

WHILE (»lfVERKO$ ■ "IKP: OMVER^S »ns«: PRINT •^STINATINC..''; 
FOR I • 1 TO lALL: 1SCP(I} « 0: IVARCD ■ 0: NEXT I 
FOR P « 1 TO PALL: PEXP(P> « 0: PVAR(P} « 0: NEXT P 
FOR P » 1 TO PALL 

!F POXMTCP} > 0 THEN 
FOR I <* 1 TO lALL 

IF ICOlMfd} > 0 AND RESPON^CP, !} >e 0 THEN 
■Calculate the expected responses for items and persons which meet: 
a«aSS II / CH ♦ EXPCILOGITCD - PLOGITCP)?) 'PrcAjabiUty of success 
IEXP(!> * lEXPd) * Sl^CESS 'Accumulate successes to fifve expKted score 
PEXPCP) « PEXP(P> ♦ ajCCESS 

VARIANCE « aiKJESS • Cll • axSSSS) 'Binoinial variance 

IVARCt) > IVARd) * VARIANCE 'Accunulate variance of expectations 

f^AR{P> > PVAR(P) * VARIANCE 

im IF 

NEXT I 
m IF 
NEXT P 

•Re-estinate the Item calibrations 

I»KAN « 0 

FOR t » 1 TO lALL 

IF ICOUNTd) > 0 THEN 

•Ite have not converged if difference betw^ observed and expected scores is greater than .1 score points 
RESIDUAL » ISCORECI) • lEXP(l) 
IF ABS(RESIDUAL) > .1 THEN CQNVERGEDt * "NO" 
•Adjust loflit calibratfona Newtcn'Raphson apprsoch 
ILOGlTd) « ILOeiTCU - RESIDUAL / (IVARd) ♦ 1> 
'Aceuwtate fta« eaUbrations to Ne cm detemine their neen later 
IMEAN « INEAN ♦ ILOGlTd) 
END IF 
NEXT I 

I Re-eat inate the person neast^es 
FOR P » 1 TO PALL 

IF PCQUNTCP) > 0 THEN 

'Me have net eonvert^ if difference between observed and expected scores is greater than .1 score points 
RESIDUAL » P»»RE<P} • PEXP(P) 
IF AISCRESIOUAL) » .1 THEN CONVERSEDt » "NO" 
•Adjust logft ealibratfom using Newton-Rsphson appraoch 
PLOCITCP) » fnLOGIT(P) «- RESIDUAL / (PVAR{P} ♦ 1) 
END IF 
NEXT P 

•Center ItaM ealibrationa about zero logis 
FOR 1 > 1 TO lALL 

IF ICOUNTd) » 0 THEN ILOGlTd) » IL06IT{!) - tm^ / ITOTAL 
KGCT I 



1« 



19 



* 



*E»tfMtts ok to eaieulatt f ft ttatfttln ui1n0 thM* Mtlmtet 
rat I • 1 fo mu itRFitci} » 0{ tairrtTcn » o$ ivakci} » O: itsrr i 

rai P ■ 1 TO MUs NMFITCP) - tit raUTPITC^) » Os PVMIC^) • 8s IKXT P 
*Ctloitm tlw tfffffeutty ItvtU of lt«w Mid ptrunt Mtuitly Int^rtetfng: 
PUMT f2, « UK K HECTEB eSKIHSS* 

wrniT 12, "PSMai", "iTfH", 'VES'aNSi, ■^xpfcra*. "•esiouai", •vmKim*, mim. res» 

m P » 1 TO PALL 
If PCOiMT(P> > 0 TtSN 
PSR t * 1 TO lAU 

IF leOUNTdl » 0 «9 K»OR«CP« I) »* 0 Tt^ 
tUCKM ■ II / (II ♦ EXPCILOfiiTCU - PL06lt(P))> 
MilMRX « WKESS * (II • MJCKSS) 

IWIUI(I> • IVMd} ♦ miiUKSs P\WICP) » PVmP> * VARIANCE 
•AeonuUtt tew rMidMt w|kMr«i 
miOUAL • (IK90iiSi(P, t) • aCfXSS) " 2 

tlNFITd) * IINFlTd) * tESISUAL: PINFITCP) • PINFITCP) * RESIDUAL 
*Aieeunilattt ttandtrdlMd rMtduit «|utrtd 

lOUTFlTd) ■ lOUTPITCI) ♦ STAMlESs POUTFIKP) ■ K>UTFIT(P> ♦ STAMRES 

IF ARKSTAffitES) > 2 TISN PRINT I, P, RESPONSCP, I). SUCttSS, RESIDUAL. VARIMICE, STANRES 
END IF 
NEXT I 

sm IF 
1^ p 

•Caloilate fit statistics for the Itens 
FOR I » 1 TO lALL 

IF laXMTCO > 0 THEN IINFITCI) « IlNFITd) / IVARd}: lOUTFlTd) > ICXiTFITd) / ICOUMTd) 
NEXT I 

•Calojlste fit statistics for th« per6or» 
FOR P « 1 TO PALL 

IF i^OiarrCP) > 0 TIffiN PINFITCP) ■ PINFITCP) / PVARCP)! POUTFITCP) « PCJUTFITCP) / PCOUNTCP) 
NEXT P 

•Caloilat* the Mm Inhersnt In UCON est {nation 
BIAS « CITOTAL • II) / ITOTAL 

*Now adjust Measur«e»nts for this bias, and calculate standard errors 
FOR I » 1 TO tAU 

IF ICOMTCI) > 0 THEN ILOGITCI) « ILOGITd) * BIAS: ISECI) « BIAS / SORCIVARd)) 
MPCT I 

BIAS « CPTOTAL - 11) / PTOTAL 
FOR P ■ 1 TO PALL 

IF roOUNTCPI > 0 THEN PLOGITCP) • PLOGITCP) * BIASs PSECP) « BIAS / »R(PVAR{P)) 
NEXT P 

iReport of attlnatM obtained 
PRINT #2, "ESTIMATES* 

PRINT i2, "WR^", "COUNT", "SCORE", »»ffiAaJRE", "S.E.", "INFIT' "OUTFIT", "NAME" 
FOR P a 1 TO PAU 
IF Pt»MTCP) > 0 THEN 

PRINT m, P, PCOUNTCP), PKMRECP), PLOGITCP), P^CP), PINFITCP), POUTFITCP), PNAHE$(P) 
ELKIF PCOUNTCP) " -1 THEN 
PRINT fi, P, "NMINUPf", PNAietCP) 

ELSE 

PRINT #2, P, "MINIMHf", PNAMEtCP) 
ENO IF 
NEXT P 

WINT i2, "ITEM", "COUNT", "SCORE", "MEASURE", "S.E.", "INFIT", "OUTFIT" 
FOR i • 1 TO lAU 
IF lOOUNTd) > 0 Ti€N 

MINT le, I, lOUHTCI), ISCORECI), ILOGITCI), ISECI), IINFITCI), lOUTFITCD 
ELKIF ICOUNTCI) " -2 THEN 

PRINT n, I, "MAXIMUM" 
ELSE 

PRINT n, I, "MINIMUM" 
END IF 

Nsrr I 

Q^sSTOP 



19 



